420 research outputs found

    Depth map compression via 3D region-based representation

    Get PDF
    In 3D video, view synthesis is used to create new virtual views between encoded camera views. Errors in the coding of the depth maps introduce geometry inconsistencies in synthesized views. In this paper, a new 3D plane representation of the scene is presented which improves the performance of current standard video codecs in the view synthesis domain. Two image segmentation algorithms are proposed for generating a color and depth segmentation. Using both partitions, depth maps are segmented into regions without sharp discontinuities without having to explicitly signal all depth edges. The resulting regions are represented using a planar model in the 3D world scene. This 3D representation allows an efficient encoding while preserving the 3D characteristics of the scene. The 3D planes open up the possibility to code multiview images with a unique representation.Postprint (author's final draft

    2D-3D Geometric Fusion Network using Multi-Neighbourhood Graph Convolution for RGB-D Indoor Scene Classification

    Full text link
    Multi-modal fusion has been proved to help enhance the performance of scene classification tasks. This paper presents a 2D-3D Fusion stage that combines 3D Geometric Features with 2D Texture Features obtained by 2D Convolutional Neural Networks. To get a robust 3D Geometric embedding, a network that uses two novel layers is proposed. The first layer, Multi-Neighbourhood Graph Convolution, aims to learn a more robust geometric descriptor of the scene combining two different neighbourhoods: one in the Euclidean space and the other in the Feature space. The second proposed layer, Nearest Voxel Pooling, improves the performance of the well-known Voxel Pooling. Experimental results, using NYU-Depth-V2 and SUN RGB-D datasets, show that the proposed method outperforms the current state-of-the-art in RGB-D indoor scene classification task

    SkinningNet: two-stream graph convolutional neural network for skinning prediction of synthetic characters

    Get PDF
    This work presents SkinningNet, an end-to-end Two-Stream Graph Neural Network architecture that computes skinning weights from an input mesh and its associated skeleton, without making any assumptions on shape class and structure of the provided mesh. Whereas previous methods pre-compute handcrafted features that relate the mesh and the skeleton or assume a fixed topology of the skeleton, the proposed method extracts this information in an end-to-end learnable fashion by jointly learning the best relationship between mesh vertices and skeleton joints. The proposed method exploits the benefits of the novel Multi-Aggregator Graph Convolution that combines the results of different aggregators during the summarizing step of the Message-Passing scheme, helping the operation to generalize for unseen topologies. Experimental results demonstrate the effectiveness of the contributions of our novel architecture, with SkinningNet outperforming current state-of-the-art alternatives.This work has been partially supported by the project PID2020-117142GB-I00, funded by MCIN/AEI /10.13039/501100011033.Peer ReviewedPostprint (author's final draft

    Learning task-specific features for 3D pointcloud graph creation

    Full text link
    Processing 3D pointclouds with Deep Learning methods is not an easy task. A common choice is to do so with Graph Neural Networks, but this framework involves the creation of edges between points, which are explicitly not related between them. Historically, naive and handcrafted methods like k Nearest Neighbors (k-NN) or query ball point over xyz features have been proposed, focusing more attention on improving the network than improving the graph. In this work, we propose a more principled way of creating a graph from a 3D pointcloud. Our method is based on performing k-NN over a transformation of the input 3D pointcloud. This transformation is done by an Multi-Later Perceptron (MLP) with learnable parameters that is optimized through backpropagation jointly with the rest of the network. We also introduce a regularization method based on stress minimization, which allows to control how distant is the learnt graph from our baseline: k-NN over xyz space. This framework is tested on ModelNet40, where graphs generated by our network outperformed the baseline by 0.3 points in overall accuracy

    Comparison of MPEG-7 descriptors for long term selection of reference frames

    Get PDF
    During the last years, the amount of multimedia content has greatly increased. This has multiplied the need of efficient compression of the content but also the ability to search, retrieve, browse, or filter it. Generally, video compression and indexing have been investigated separately. However, as the amount of multimedia content grows, it will be very interesting to study representations that, at the same time, provide good compression and indexing functionalities. Moreover, even if the indexing metadata is created for functionalities such as search, retrieval, browsing, etc., it can also be employed to increase the efficiency of current video codecs. Here, we use it to improve the long term prediction step of the H.264/AVC video codec. This paper focuses on the comparison between four different MPEG-7 descriptors when used in the proposed scheme.Peer ReviewedPostprint (published version

    Gesture controlled interactive rendering in a panoramic scene

    Get PDF
    The demonstration described hereafter covers technical work carried out in the FascinatE project [1], related to the interactive retrieval and rendering of high-resolution panoramic scenes. The scenes have been captured by a special panoramic camera (the OMNICAM) [2] with is capturing high resolution video featuring a wide angle (180 degrees) field of view. Users can access the content by interacting based on a novel device-less and markerless gesture-based system that allows them to interact as naturally as possible, permitting the user to control the rendering of the scene by zooming, panning or framing through the panoramic scenePeer ReviewedPostprint (published version

    Análisis de la dependencia económica de la República de Ecuador frente a la República Popular de China. período 2008 – 2014

    Get PDF
    Ecuador posee un modelo de producción primario exportador, basado en la explotación y comercialización de petróleo y productos tradicionales, mismo que condiciona el desarrollo económico del país. Tras el gobierno del Econ. Rafael Correa, se presenta un modelo político-económico que tiene como fin lograr la industrialización mediante el cambio de la matriz productiva. Sin embargo, el objetivo no es alcanzado. Una de las causas es el fortalecimiento de las relaciones bilaterales con China quien confirma su posición dentro de los principales socios comerciales para Ecuador. Asimismo, aumenta el flujo de inversión extranjera directa, misma que es enfocada en la explotación de minas y canteras. Además, se constituye como fuente principal de financiamiento para la construcción de proyectos emblemáticos ecuatorianos. En este punto, China representa un socio estratégico para el país. Sin embargo, mientras el país asiático mantiene un constante crecimiento, Ecuador no consolida su proceso de industrialización. La evolución de las relaciones bilaterales, en los tres ejes mencionados, crea una dependencia económica frente a China y por ende confirma la posición de Ecuador, como país subdesarrollado. Esta realidad es estudiada mediante la Teoría de la Dependencia, al plantear directrices claras para probar la problemática establecida

    Spatio-temporal road detection from aerial imagery using CNNs

    Get PDF
    The main goal of this paper is to detect roads from aerial imagery recorded by drones. To achieve this, we propose a modification of SegNet, a deep fully convolutional neural network for image segmentation. In order to train this neural network, we have put together a database containing videos of roads from the point of view of a small commercial drone. Additionally, we have developed an image annotation tool based on the watershed technique, in order to perform a semi-automatic labeling of the videos in this database. The experimental results using our modified version of SegNet show a big improvement on the performance of the neural network when using aerial imagery, obtaining over 90% accuracy.Postprint (published version

    Segmentation-based multi-scale edge extraction to measure the persistence of features in unorganized point clouds

    Get PDF
    Edge extraction has attracted a lot of attention in computer vision. The accuracy of extracting edges in point clouds can be a significant asset for a variety of engineering scenarios. To address these issues, we propose a segmentation-based multi-scale edge extraction technique. In this approach, different regions of a point cloud are segmented by a global analysis according to the geodesic distance. Afterwards, a multi-scale operator is defined according to local neighborhoods. Thereupon, by applying this operator at multiple scales of the point cloud, the persistence of features is determined. We illustrate the proposed method by computing a feature weight that measures the likelihood of a point to be an edge, then detects the edge points based on that value at both global and local scales. Moreover, we evaluate quantitatively and qualitatively our method. Experimental results show that the proposed approach achieves a superior accuracy. Furthermore, we demonstrate the robustness of our approach in noisier real-world datasets.Peer ReviewedPostprint (author's final draft
    corecore